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The probability integral of the multivariate normal distribution has received con* 
siderable attention since Sheppard (1900) and Pearson (1901) published their serrinal 
work on the bivariate normal distribution. In the general case, we are concerned with 
evaluating 
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where {pij} represents the n x n symmetric correlation matrix of the x,''s, c>.d 
fi^u ^3) • • • ) ^n; {Pt/}) is standardized multivariate normal density function, Ci- 
rect evaluation of Fn is only possible for special cases of {pij}. For example, Dunnett 
and Sobel (1955) have shown that when pij = <^cii{i¥^j)i where | oe, |< 1, then 
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where 9 represents the univariate normal distribution function. This special case 
is the basis for much of item-response theory. More recently, however, Bohrer and 
Schervish (1981), have developed an error bounded algorithm for evaluating Fn for 
general {Pty}. Computationally, this algorithm is restricted to n 7, and even at 
n 7, it can require as much as 24 hours to compute a single probability with 10*"^ 
accuracy on a computer ehan is capable of approximately 1*2 million scalar floating 
point operations per second. 

The purpose of this report is to present a fast and general M)proximation for 
rectangular regions of the multivariate normal distribution function, that is based 
on Clarices (1961) approximation to the moments of the maximum of n j<Hntly nor- 
mal random variables. The peribrmance of this approximation is then compared to 
special cases in which the exact results are known (e.;., fii^^p^ .5), cases in which 
the integral reduces to a unidimensional quadrature evaluation (e.;., p^y ^ <^*<ky), 
and finally error bounded reduction formulae for {/>,-/} sad n < 7. 
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ABSTRACT 



The probability integral of the multi\*ariate normal distribution has received 
considerable attention since Sheppard (1900) and Pearson (1901) published 
their seminal work on the bivariate normal distribution. In the general case, 
we are concerned with evaluating 

/hi fh2 fhn 
/ • • • / /(Xl, X2 Xn\ {pij])dXx ...dXn 

where {pij} represents then x n correlation matrix of the x^'s, and /(xi,X2, . .Xn\ {Ptj}) 
is the standardized multivariate normal density function. Direct evaluation of 
Fn is only possible for special cases of {pij}. For exrmple, Dunnett and Sobel 
(1955) have shown that when p^j = a,Q;(z ^ j), where | a,- |< L then 

where <I> represents the univariate normal distribution function. This special 
case is the basis for much of modern psychometric theory. More recently, 
however, Bdhrer and Schervish (19S1), have developed an error-bounded algo- 
rithm for evaluating Fn for general {ptj}. Computationally, this algorithm is 
restricted to n = 7. and even at n = 7. it can require as much as 24 hours to 
compute a single probability with 10"'^ accuracy on a computer than is capable 
of approximately 1-2 million scalar floating point operations per second. 

The purpose of this report is to present a fast and general approximation 
lor rectangular regions of the multivariate normal distribution function based 
on Clark's (1961) approximation to the moments of the maximum of n jointly 
normal random variable.>. The performance of this approximation compared to 
special cases in which the exact results are known and error-bounded reduction 
formulae show the accuracy of the approximation to be adequate for many 
practical applications where multivariate normal probabilities are required. 
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1 Introduction 



The probability integral of the multivariate normal distribution has re sived 
considerable attention since Sheppard (1900) and Pearson (1901) published 
their seminal work on the bivariate normal distribution. In the general case, 
we are concerned with evaluating 

^nihx.h, = r' .../'" /(Xl,X2 Xr,;{p,j})d.T^...dXn 

(1) 

where {p,j} represents the n x n symmetric correlation matrix of the x,>, and 
/t-ft'J^2 ■Xn\{Pij}) is the standardized multivariate nornial density func- 
tion. Direct evaluation of is only possible for special oases of {pij}. For 
•example. Dunnett and Sobel (195o) have shown that when p,j = q,Qj(z ^ ;). 
where | q, ]< 1. then 



•'-^ L.=l Vvl-QV. 



f{y)d{y) 



(2) 



where <I> represents the univariate normal distribution function. The probabil- 
ity in equation (2) can be approximated to any practical degree of accuracy 
using Gauss- Hermite quadrature (Stroud and Sechrest, 1966). It should be 
noted that when p,j = p foi i.j. then 



f{y)d{y) 



(3) 



and if p = .3 and h = 0. 



F.iO.O 0:{..5}) = 



n + l 



(4) 



More recently, however, Bohrer and Schervish (1981). have developed an 
error-bounded algorithm for evaluating F„ for general {p,j}. Computationally, 
thip algorithm is restricted to n = 7. and even at n = 7. it can require as much 
as 24 hours to compute a single probability with 10"^ accuracy on a computer 
than is capable of approximately 1-2 million scalar floating point operations 
per second. It is unclear whether vectorization of this algorithm is possible, so 
that the greatly increased speeds of parallel computing environments could be 
exploited ( e.g.. 20-80 million floating point instructions per second). Even still, 
it is unhkely that this algorithm would be computational tractable for n > 10. 



An alternate approach to approximating Fn, can be obtained by noting 
that 

Fn ^ Pr{xi <hi,X2<h2,..,Xn< hn) (5) 

U hi...K = h s= 0. and the x, follow a standardized multivariate normal 
distribution. /^|? is a so-called "orthant" probability. Note, however, that this 
orthant probability is equivalent to 

F^ = Pr{max(xx xj < 0} (6) 

If max(xi Xn) were normally distributed, which it clearly is not, with mean 

£[max(x, Xn)] and variance l''(max(x, x^)), then, 

h - E(max(xx 



yjV(71iax{Xx I'n) 



(7) 



where in this case h = 0. For the more general rectangular region case of hiy 
we could set /i = 0 and subtract /i, from the mean values of each of the Xj, 
which to this point have been expressed in standardized form. 

To use this algorithm, we must first have an accurate method of computing 

the first two moments of max(x^ Vn) where the x, have a joint multivariate 

normal distribution with general correlation } , and some bound on the error 

introduced by assuming that max(x, x^) has a normal distribution. Such 

an approximation has been described by Clark (1961), and in the following, 

we describe its use in connection with evaluating F„(.ri,X2 x„;{p,j}). We 

begin by reviewing Clark's original formulae. 



2 The Clark Algorithm 

Let any three successive components from an n-variate vector, y,. be dis- 
tributed: 

. .'/<+.' 

Let y^ — max(i/,) = ^)^, and compute the probability that > as 
follows: 

set = (/J, -/z,+t)/C.+i. 

where (,"\, = c; + a^^^^- lc^a^^i p,^,^x- 

Then > (/) = P(«/.^i - i? > 0) ' 







r 
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the value of the univariate normal distribution function at the standard deviate 

Now let y,>i = max(yn y»>i ) and assume (as an approximation) that i/i+i ) 
is bivariate normal with means, 

= = /^i*(:^i+i) + A^i>i*(-^i+i) + Ci+i</>(^»+i)i 

variances 



where 



(10) 



and correlation 



/. . <7,/)i.,+2$(c,+ i) + (J,+l/>,+ i.,+2<&(-C,+ i) 

= 7^ — ^ . (U) 



Then. 



/"(y.+i = niax( ?/,.(/,+ ,. )/,+ ,)) :^ P((j/,+2 - J/i+i > 0)n (t/,+2 -yj > 0)) 

(12) 

is approximated by 

P(y,+i > = F(y,+2 -y,+ i > 0) 



= * 



/'.+2 - /'(y.+i) 



, V^+i + "^'ly.+i ) - 2(7,+2<T(y,+i )p(y,+i, J/,+2), 



(13) 



Assuming as a working approximation that j/j+i is normally distributed 
with the above mean .and variance, we may therefore proceed, recursively from 
2 = 1 to i = n - I. where t/n+i is an independent dunfimy variate with mean 
zero knd variance zero (i.e. ^/n+i = 0). Then, for example. 
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P(yn+i = maxlyi, y2, . • • , yn+i)] 
= P((y„*i-yi>0)n(y„^x-yj>0)n...n(y„*,-yn>0)] 

= p[(-y,>o)n(-y2>o)n...n(-yn>o)] U*) 

.nnroximates the probabilitv of the negative orthant of the specified multi- 
L r no^al dLribution. In the case of n correlated s-dard -^^^^^^^ 
negative and positive orthant probabilities are identical. The probability of any 
h'" onh^nt'can be obtained by reversing the signs of the vanat^ corr^^^^^^^^^^^ 
inff to I's in the orthant pattern. Of course, y^+i is not normally distributed 
Errors pXed by substituting normal approximations for the mou.ents of 
are discussed in the, following section. 

.More generally, to compute a multivariate normal probability over an n 
dimensional rectangular region, for example. 



(15) 



we compute the negative orthant setting ;.n*, = h. Finally to approximate the 
integral for general h,. w. compute the negative orthant by setting /in+i - 0 
and fit = - 

3 Accuracy of the Clark Approximation 

The errors of the Clark approximation result from the replacement non- 
normal distributions by normal approximations. For example, suppo^ that 
v-,-e are intereslea in the maximum of four standard normal variables, e 
n.ax(y,.y3,y3.y.). By assuming that y, is normally distributed with expected 
vle'£[max(y. y:)] and variance V(max(y..y.)l, we can then use he mo- 
rrents of max(y,.y3) as an approximation for those of max(yi,y2,y3). i^exi, 
Tassl tha 3 is normally distributed with expectation and vanance ecjua^ 
to the corresponding moments of max{h.y.l and can therefore use the mo- 
ments of max(y3, y^) as an approximation for those oi max(y,, . . _ . yj^ In h 
example of course, h and y, are not normally distributed. Furthermore, this 
Ta rare' cLe in wh ch the distribution of a statistical v^riate diverges from 
o ma tv as sample size increases. Tippet (1925) first showed tha skewnes 
and kurtosis of the maximum of n standard normals goes form .019 and .62 
respectivelv for n ■= > to .-:-29 and .765 for n = 100 to 618 and l.O.b for n - 
lOOO: In terms of expected values of n standard normal -;-bles, t e effect o 
this non-normality is quite small. For example, for n = 10, the true value is 



1.5388 and the approxtmatiou yields 1.5367. Even for n ss 1000 the expected 
value is 3.2414 and the \pproximated value is 3.2457. 

The effect of non-nor.nality on the accuracy of the approximation is also 
dependent on the difference between £(y,-i,y,). For example,, suppose we 
wish to approximate the moments of maar(j/i,j/2) where yi and j/j are not 
normally distributed. Clark (1961) points out that if the difference E{yi) - 
£(1/2) is large relative to the greater of and V"/^(y2) the random 

variable max(yi.y2) is almost identical to j/i. Certainly the first two moments 
of max{yi,y2) would be minimally affected by replacing yi and yj by normal 
approximations. However if E(yi) - E{y2) is small relative to the respective 
standard deviations, then the use of normal approximations could conceivably 
result in significant errors in the appro.ximation of the mean and va-iance of 
their ma.ximum. 

In light of this, the following illustrations of the accuracy of the Clark 
appro.ximation are. in fact, the worst case results, since they repirisent the case 
in which the expected \ alues of the 1/, are equal. These results indicate that the 
error bound for the Clark approximation is approximately 10"^, as illustrated 
in the following section. 

4 Illustration 

To evaluate the performance of this algorithm, we have e.xamined a series of 
equa-correlated multivariate normal distributions for which e.xact results are 
known (see Gupta. 1963) and those considered by Schervish (1984). Table lA 
displays results for "J to 7 equa-correlated standard normal random variables 
with selected values of p = .2. .3. .8 and .9, and upper integration bounds 
of 0. 1. and 2. Inspection of the tabled probabilities reveals that the Clark 
algoiithm is generally accurate to at least 10"^ and that computational times 
are a linear function of dimensionality. The speed of the Clark algorithm does 
net depend on p. In contrast, the speed of .\IULXOR (Schevish, 1984) is expo- 
nentially increasing with both dimensionality and p. Ir the 7-variate normal 
case with p,j = .9, MUL.VOR required almost a day to compute a probability 
which was a:curate to 2 x 10-^ whereas the Clark algorithm computed the 
same probability with 4 x lO"-* accuracy in less than three thousandths of a 
■second. Inspection of these results and others not reported here, suggest that 
the accuracy of the Clark approximation increases with increasing p. 

Table IB displays results for orthant probabilities of higher dimensional 
integrals (n =^iO\ 20. and 40.), for the special case o{ p = .5. where = 
l/(n + I). .Again, results are accurate to at least 10"^. and computational 
times are lineai^in n. MULNOR could not be used to evaluate integrals of this 
dimensionalitv. 

' O 
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Finally, Table IC displays results for some tail ptobabilities of the multi* 
vallate normal distribution. In this case, the upper bound of the integration 
was -2.5, n =: (3,5,10)^ and p = (.5„9). These probabilities ranged from IQ-^ 
to 10"* and accuracy of the Clark approximation was 10*' in all cases. 

5 Discussion 

Clark's (1961) formulae for the moments of the maximum of n correlated ran- 
dom normal variates can clearly be used to obtain fast and accurate ap- 
proximation to multivariate norma), probabilities. Ex ^miration of a series of 
examples involving special cases in which the true results are known, reveals 
that the error bound foi the approximation is approximately lO*"^ regardless 
of dimensionality, and th<^t accuracy increases with increases in | p |, These 
ref ults are conservative in that we would expect the ill effect of using normal 
approximations to be jji-' ♦est when /(, = = l,n) which is the case used 
in the illustrations. 

In terms of computatioaal speed, the Clark approximation is clearly un- 
paralleled. A reasonable estimate of the speed of the Clark algorithm is given 
by, 

/.0004(u)\ 

soeed = — seconds 

\ neganop/ 

where megaflop is the number of scalar floating point instructions per second 
that the computer is capable of performing. 

Xumeious applications of the Clark algorithm suggest themselves. Some 
preliminary work in this area has already been conducted by Daganzo (1984)^ 
in the context of discrete choice models of consumer behavior, and by Gibbons, 
Bock and Hedeker (1987) in item-response theory. Other potential applications 
include multivariate generali/'ations of probit analysis (see .Ashford and Sow- 
den, 1970 for the bivariate case), and random-effect probit models (Gibbons 
and Bock. 1987), where the Clark approximation was used to estimate first- 
order autocorrelation among the residual errors. 

.Another area of potential interest is in the approximation of multivariate 
t probabilities, which can be considered as the joint distribution of n variates 
f, = (: = 1, 2, . . . , n ) where the c, have a multivariate normal distribution 
with zero means and unknown variance (T^, and known correlation malrix {pij}* 
while us^ja'^ has a \^ distribution with u degrees of freedom and is independent 
of the z,, Dunnett (1955) has evaluated this joint density for the case of 

p^^ = p ^ Jy^ by obtaining 'Fn(ci. C2 ^n'APh)) i^itegrating out s. Use 

of the Clark algorithm would provide a generali;;ation of their result to the 

I 
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case of gencr^d {/>,,}, & natural application of which would be a generalization 
of Dunnett*8 test to the case of unequal sample sizes among the fc + I groups 
(i.e., treatment groups and a single control). 
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Table 1 



Prrbability that n Standard Normal Random Variables with 
Common Correlation are Simultaneously < h. 

A, Comparison with MULNOR 



MULXOR Clark 



n 


h 


P 


True' 


Prob 


Time^ 


Time^ 


Prob 


Time"' 


:} 


•2.0 


.9 


.96170 


.96170 


.196 


.060 


.96185 


.0008 


4 


2.0 


.0 


.92845 


.92845 


7.275 


1.760 


.93088 


.0012 


4 


2.0 


.8 


.947-59 


.94758 


13.735 


2.913 


.94819 


.0012 


4 


2.0 


9 


.95708 


.95707 


18.557 


3.855 


.95730 


.0012 




l.O 


.3 


52111 


.52113 


40.161 


8.900 


.51341 


.0016 


7' 


0.0 


.9 


.32967 


.:J2965 


.\A 


98040 


.32921 


.0026 


7' 


0.0 


•> _ 


.04043 


.04038 


NA 


733 


.04122 


.0026 



» Gupta (1963) 

- Seconds on a DEC 2060 

3 Seconds on a COMPAQ 386-25. Weitek 3167, SVS FORTRAN 
^ Accuracy set to 10"^ instead of 10"^ for MULNOR 



B. Higher Dimensional Integrals 











Clark 


n 


h 


P 


True' 


Prob Time 


10 


0 


.5 


.09091 


.08907 .0044 


20 


0 


.5 


.04762 


.04657 .0134 


40 


0 


.5 


.02439 


.02390 .0459 



'^nlO.O 0:{..5})=;^ 



C. Tail Probabilities 



n 


h 


P 


True' 


Clark 


3 


-2.5 


.5 


.0001- 


.00021 


3 


•2.5 


.9 


.00230 


.00231 


5 


-2.5 


.5 


.00003 


.00004 




-2.5 


.9 


.00157 


.001-56 


10 


-2.5 


.5 


.00000 


.00000 


10 


-2.5 


.9 


.00099 


.00098 



' Gupta (1963) 
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